Sequencing and Raw Sequence Data Quality Control    ◾    35

Figure 1.28 shows the FASTX-toolkit tools in the “bin” directory after downloading and

extracting the compressed archive file.

FASTX-toolkit includes several tools for the processing of FASTQ files as described in

Table 1.4. You can display the usage and options of any of the executable programs by

entering the program name with “-h” option on the command-line prompt. For instance,

to display the help for “fastq_quality_filter”, simply enter the following on the command

line:

fastq_quality_filter -h

To show how FASTQ files are processed, we will download a raw FASTQ file from the NCBI

SRA database and modify its name for the practice. The following commands create the

TABLE 1.4  FASTX-Toolkit Programs and Descriptions

Command Name

Description

fastq_to_fasta

converts FASTQ files to FASTA files

fastx_quality_stats

charts Quality Statistics and Nucleotide Distribution

fastx_collapser

collapses identical sequences into a single sequence

fastx_uncollapser

expands collapsed identical sequences

fastx_trimmer

trims reads in a FASTQ files (removing barcodes or noise)

fastx_renamer

renames the sequence identifiers in FASTQ/A file

fastx_clipper

removes sequencing adapters/linkers

fasta_clipping_histogram.pl

creates a Linker Clipping Information Histogram

fastq_quality_boxplot_graph.sh

creates quality boxplot

fastx_nucleotide_distribution_graph.sh

creates nucleotide distribution graph

fastx_nucleotide_distribution_line_graph.sh

creates nucleotide distribution line graph

fastx_reverse_complement

produces the Reverse-complement of each sequence

fastx_barcode_splitter.pl

splits a FASTQ/FASTA files containing multiple samples

fasta_formatter

changes the width of sequences line in a FASTA file

fasta_nucleotide_changer

converts FASTA sequences from/to RNA/DNA

fastq_quality_filter

filters sequences based on quality

fastq_quality_trimmer

trims (cuts) sequences based on quality

fastx_artifacts_filter

FASTQ/A Artifacts Filter

fastq_masker

masks nucleotides with “N” based on quality

FIGURE 1.28  FASTX-toolkit programs.